Fix #29: Attempt extraction after parsing is finished, before loading. #143

Osmose · 2018-10-01T18:44:26Z

I tested running extraction on document_start and universally found that extraction wouldn't turn up anything useful (surprise, you need a fully-parsed DOM to meaningfully extract anything). document_end is pretty consistently returning good product info, which saves us having to wait for every resource to load. As far as I can tell there's not much opportunity to start work earlier than this.

I also couldn't help myself and did a minor refactor of the extraction module to be more in line with how the other code modules are organized. It's mostly renames.

biancadanforth

This is great, thanks Osmose. The refactor makes a lot of sense to me and seems to be working just fine. I tried this out on 7 random product pages, and I found that 6 of 7 extracted successfully at this earlier step.

I suggest two changes here, one being very important: Only attemptExtraction a second time if the first time fails; as-is, we're running it twice on every page no matter what.

biancadanforth · 2018-10-04T17:55:03Z

src/extraction/index.js

-      date: (new Date()).toISOString(),
-    },
-  });
+  if (extractedProduct) {


Fallback extraction does not currently return null if it doesn’t find anything, so this would be a good time to change that before adding this existence check. (My bad)

We do also check for correct data types via propTypes further downstream, but it'd be nice to be consistent between the two extraction methods.

biancadanforth · 2018-10-04T18:34:51Z

src/extraction/index.js

-  if (document.readyState === 'complete') {
+  // Extract immediately, and again if the readyState changes.
+  attemptExtraction();
+  document.addEventListener('readystatechange', () => {


We probably only want to add this listener and attempt extraction again if the first attempt failed.

The idea was to perform a second extraction in case JavaScript has modified the page and changed the product info, to ensure we get the correct product info every time. I'm not sure how common this is, though, or how performing two extractions affects the user's experience.

I think we should keep it, but can be convinced otherwise.

Ah hmm... so see if we can get something at all to put in the popup sooner, and then update with the second round of extraction...

Perhaps we could add a scalar probe that increments if the results of the first to the second extraction are different and compare that to the number of extraction attempts? Might be a wishlist probe.

What do you think about that idea @Osmose ?

Yeah I wouldn't bother adding a probe like that initially.

Osmose · 2018-10-08T18:05:05Z

I refactored extraction to return null in cases where the selectors or tags for all features aren't found. I also split open graph and selector-based extraction into two separate modules. In the future, if we add optional extraction features, we'll need to modify these, but that's fine.

biancadanforth

Thanks Osmose! Looks great.

I found a bug I actually introduced earlier in the Open Graph extraction and just filed a bug. Will fix it shortly.

biancadanforth · 2018-10-09T19:08:40Z

src/extraction/open_graph.js

+      return null;
+    }
+
+    extractedProduct[feature] = metaEle.getAttribute('content');


There's a bug here, since the propType for extractedProduct expects a type of Number for the price value downstream. I took care of this for fallback and Fathom extraction earlier, but I neglected to add it here. I filed a follow-up issue #154 . I'll take care of that right now.

… found. Open Graph and selector-based extraction are now separate forms of extraction instead of a single, "fallback" extraction method.

Osmose requested a review from biancadanforth October 1, 2018 18:44

Osmose force-pushed the robust-extraction branch from d0e07cc to f8ed9d3 Compare October 2, 2018 23:46

biancadanforth suggested changes Oct 4, 2018

View reviewed changes

Osmose requested a review from biancadanforth October 8, 2018 18:03

biancadanforth approved these changes Oct 9, 2018

View reviewed changes

biancadanforth mentioned this pull request Oct 9, 2018

Fix #154: Parse Open Graph price string to a number #155

Merged

Michael Kelly added 3 commits October 10, 2018 10:27

Mild refactor of the extraction module.

02c220d

Fix #29: Attempt extraction after parsing is finished, before loading.

3af22ed

Split extraction methods and ensure they return null when no match is…

150cd9b

… found. Open Graph and selector-based extraction are now separate forms of extraction instead of a single, "fallback" extraction method.

Osmose force-pushed the robust-extraction branch from f5309a3 to 150cd9b Compare October 10, 2018 17:27

Osmose merged commit 644b70c into master Oct 10, 2018

Osmose deleted the robust-extraction branch October 10, 2018 17:32

biancadanforth mentioned this pull request Oct 23, 2018

The "Add This Product" button is not actionable on Walmart and Home Depot after the page is fully loaded #192

Closed

biancadanforth mentioned this pull request Aug 2, 2019

Make Fathom extraction more performant #319

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #29: Attempt extraction after parsing is finished, before loading. #143

Fix #29: Attempt extraction after parsing is finished, before loading. #143

Osmose commented Oct 1, 2018

biancadanforth left a comment

biancadanforth Oct 4, 2018

biancadanforth Oct 4, 2018

Osmose Oct 8, 2018

biancadanforth Oct 9, 2018

Osmose Oct 9, 2018

Osmose commented Oct 8, 2018

biancadanforth left a comment

biancadanforth Oct 9, 2018

Fix #29: Attempt extraction after parsing is finished, before loading. #143

Fix #29: Attempt extraction after parsing is finished, before loading. #143

Conversation

Osmose commented Oct 1, 2018

biancadanforth left a comment

Choose a reason for hiding this comment

biancadanforth Oct 4, 2018

Choose a reason for hiding this comment

biancadanforth Oct 4, 2018

Choose a reason for hiding this comment

Osmose Oct 8, 2018

Choose a reason for hiding this comment

biancadanforth Oct 9, 2018

Choose a reason for hiding this comment

Osmose Oct 9, 2018

Choose a reason for hiding this comment

Osmose commented Oct 8, 2018

biancadanforth left a comment

Choose a reason for hiding this comment

biancadanforth Oct 9, 2018

Choose a reason for hiding this comment